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Abstract 

High-level programming languages play a key role in a growing 
number of networking platforms, streamlining application develop¬ 
ment and enabling precise formal reasoning about network behav¬ 
ior. Unfortunately, current compilers only handle “local” programs 
that specify behavior in terms of hop-by-hop forwarding behav¬ 
ior, or modest extensions such as simple paths. To encode richer 
“global” behaviors, programmers must add extra state—something 
that is tricky to get right and makes programs harder to write 
and maintain. Making matters worse, existing compilers can take 
tens of minutes to generate the forwarding state for the network, 
even on relatively small inputs. This forces programmers to waste 
time working around performance issues or even revert to using 
hardware-level APIs. 

This paper presents a new compiler for the NetKAT language 
that handles rich features including regular paths and virtual net¬ 
works, and yet is several orders of magnitude faster than previous 
compilers. The compiler uses symbolic automata to calculate the 
extra state needed to implement “global” programs, and an inter¬ 
mediate representation based on binary decision diagrams to dra¬ 
matically improve performance. We describe the design and imple¬ 
mentation of three essential compiler stages: from virtual programs 
(which specify behavior in terms of virtual topologies) to global 
programs (which specify network-wide behavior in terms of phys¬ 
ical topologies), from global programs to local programs (which 
specify behavior in terms of single-switch behavior), and from local 
programs to hardware-level forwarding tables. We present results 
from experiments on real-world benchmarks that quantify perfor¬ 
mance in terms of compilation time and forwarding table size. 

Categories and Subject Descriptors D.3.4 [Programming Lan¬ 
guages]: Processors—Compilers 

Keywords Software-defined networking, domain-specific lan¬ 
guages, NetKAT, Frenetic, Kleene Algebra with tests, virtualiza¬ 
tion, binary decision diagrams. 

1. Introduction 

High-level languages are playing a key role in a growing num¬ 
ber of networking platforms being developed in academia and in¬ 
dustry. There are many examples: VMware uses nlog, a declara¬ 
tive language based on Datalog, to implement network virtualiza¬ 
tion 03; SDX uses Pyretic to combine programs provided by dif¬ 
ferent participants at Internet exchange points II3II25I : PANE uses 
NetCore to allow end-hosts to participate in network management 
decisions 131241 : Flowing offers tierless abstractions based on Dat¬ 
alog l26l : Maple allows packet-processing functions to be speci¬ 
fied directly in Haskell or Java (33); OpenDaylight’s group-based 
policies describe the state of the network in terms of application- 
level connectivity requirements I29l : and ONOS provides an “intent 
framework” that encodes constraints on end-to-end paths I28l . 


* Work performed at Cornell University. 


The details of these languages differ, but they all offer abstrac¬ 
tions that enable thinking about the behavior of a network in terms 
of high-level constructs such as packet-processing functions rather 
than low-level switch configurations. To bridge the gap between 
these abstractions and the underlying hardware, the compilers for 
these languages map source programs into forwarding rules that 
can be installed in the hardware tables maintained by software- 
defined networking (SDN) switches. 

Unfortunately, most compilers for SDN languages only handle 
“local” programs in which the intended behavior of the network is 
specified in terms of hop-by-hop processing on individual switches. 
A few support richer features such as end-to-end paths and net¬ 
work virtualization 119112811^ . but to the best of our knowledge, 
no prior work has presented a complete description of the algo¬ 
rithms one would use to generate the forwarding state needed to 
implement these features. For example, although NetKAT includes 
primitives that can be used to succinctly specify global behaviors 
including regular paths, the existing compiler only handles a lo¬ 
cal fragment i4J. This means that programmers can only use a re¬ 
stricted subset that is strictly less expressive than the full language 
and must manually manage the state needed to implement network¬ 
wide paths, virtual networks, and other similar features. 

Another limitation of current compilers is that they are based on 
algorithms that perform poorly at scale. For example, the NetCore, 
NetKAT, PANE, and Pyretic compilers use a simple translation to 
forwarding tables, where primitive constructs are mapped directly 
to small tables and other constructs are mapped to algebraic opera¬ 
tors on forwarding tables. This approach quickly becomes imprac¬ 
tical as the size of the generated tables can grow exponentially with 
the size of the program! This is a problem for platforms that rely 
on high-level languages to express control application logic, as a 
slow compiler can hinder the ability of the platform to effectively 
monitor and react to changing network state. 

Indeed, to work around the performance issues in the current 
Pyretic compiler, the developers of SDX HU extended the language 
in several ways, including adding a new low-cost composition oper¬ 
ator that implements the disjoint union of packet-processing func¬ 
tions. The idea was that the implementation of the disjoint union 
operator could use a linear algorithm that simply concatenates the 
forwarding tables for each function rather than using the usual 
quadratic algorithm that does an all-pairs intersection between the 
entries in each table. However, even with this and other optimiza¬ 
tions, the Pyretic compiler still took tens of minutes to generate the 
forwarding state for inputs of modest size. 

Our approach. This paper presents a new compiler pipeline for 
NetKAT that handles local programs executing on a single switch, 
global programs that utilize the full expressive power of the lan¬ 
guage, and even programs written against virtual topologies. The 
algorithms that make up this pipeline are orders of magnitude faster 
than previous approaches—e.g., our system takes two seconds to 
compile the largest SDX benchmarks, versus several minutes in 
Pyretic, and other benchmarks demonstrate that our compiler is 
able to handle large inputs far beyond the scope of its competitors. 



These results stem from a few key insights. First, to compile lo¬ 
cal programs, we exploit a novel intermediate representation based 
on binary decision diagrams (BDDs). This representation avoids 
the combinatorial explosion inherent in approaches based on for¬ 
warding tables and allows our compiler to leverage well-known 
techniques for representing and transforming BDDs. Second, to 
compile global programs, we use a generalization of symbolic 
automata HD to handle the difficult task of generating the state 
needed to correctly implement features such as regular forward¬ 
ing paths. Third, to compile virtual programs, we exploit the addi¬ 
tional expressiveness provided by the global compiler to translate 
programs on a virtual topology into programs on the underlying 
physical topology. 

We have built a full working implementation of our compiler 
in OCaml, and designed optimizations that reduce compilation 
time and the size of the generated forwarding tables. These opti¬ 
mizations are based on general insights related to BDDs (sharing 
common structures, rewriting naive recursive algorithms using dy¬ 
namic programming, using heuristic field orderings, etc.) as well 
as domain-specific insights specific to SDN (algebraic optimization 
of NetKAT programs, per-switch specialization, etc.). To evaluate 
the performance of our compiler, we present results from experi¬ 
ments run on a variety of benchmarks. These experiments demon¬ 
strate that our compiler provides improved performance, scales to 
networks with tens of thousands of switches, and easily handles 
complex features such as virtualization. 

Overall, this paper makes the following contributions: 

• We present the first complete compiler pipeline for NetKAT that 
translates local, global, and virtual programs into forwarding 
tables for SDN switches. 

• We develop a generalization of BDDs and show how to imple¬ 
ment a local SDN compiler using this data structure as an inter¬ 
mediate representation. 

• We describe compilation algorithms for virtual and global pro¬ 
grams based on graph algorithms and symbolic automata. 

• We discuss an implementation in OCaml and develop optimiza¬ 
tions that reduce running time and the size of the generated for¬ 
warding tables. 

• We conduct experiments that show dramatic improvements over 
other compilers on a collection of benchmarks and case studies. 

The next section briefly reviews the NetKAT language and discusses 
some challenges related to compiling SDN programs, to set the 
stage for the results described in the following sections. 

2. Overview 

NetKAT is a domain-specific language for specifying and reasoning 
about networks Eiin]. It offers primitives for matching and mod¬ 
ifying packet headers, as well combinators such as union and se¬ 
quential composition that merge smaller programs into larger ones. 
NetKAT is based on a solid mathematical foundation, Kleene Alge¬ 
bra with Tests (KAT) 1201 . and comes equipped with an equational 
reasoning system that can be used to automatically verify many 
properties of programs 1111 . 

NetKAT enables programmers to think in terms of functions on 
packets histories, where a packet (pk) is a record of fields and a his¬ 
tory (h) is a non-empty list of packets. This is a dramatic departure 
from hardware-level APIs such as OpenFlow, which require think¬ 
ing about low-level details such as forwarding table rules, matches, 
priorities, actions, timeouts, etc. NetKAT fields / include standard 
packet headers such as Ethernet source and destination addresses, 
VLAN tags, etc., as well as special fields to indicate the port (pt) 
and switch (sw) where the packet is located in the network. For 
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Figure 1: NetKAT syntax and semantics. 


brevity, we use src and dst fields in examples, though our compiler 
implements all of the standard fields supported by OpenFlow f23i . 

AetKAT syntax and semantics. Formally, NetKAT is defined by 
the syntax and semantics given in Figure^ Predicates a describe 
logical predicates on packets and include primitive tests f=n, 
which check whether field / is equal to n, as well as the standard 
collection of boolean operators. This paper focuses on tests that 
match fields exactly, although our implementation supports gener¬ 
alized tests, such as IP prefix matches. Programs p can be under¬ 
stood as packet-processing functions that consume a packet history 
and produce a set of packet histories. Filters a drop packets that do 
not satisfy a; modifications f-‘^n update the / field to n; unions 
p -I- q copy the input packet and process one copy using p, the other 
copy using q, and take the union of the results; sequences p • q pro¬ 
cess the input packet using p and then feed each output of p into 
q (the • operator is Kleisli composition); iterations p* behave like 
the union of p composed with itself zero or more times; and dups 
extend the trajectory recorded in the packet history by one hop. 

Topology encoding. Readers who are familiar with Frenetic cni, 
Pyretic (25) , or NetCore El, will be familiar with the basic details 
of this functional packet-processing model. However, unlike these 
languages, NetKAT can also model the behavior of the entire net- 







work, including its topology. For example, a (unidirectional) link 
from port pt^ on switch swi to port pfj on switch SW2, can be 
encoded in NetKAT as follows: 

dup • s\N—swi ■ pt—pt^ ■ svj-<^sw2 ■ pt-t— pfj • dup 

Applying this pattern, the entire topology can be encoded as a 
union of links. Throughout this paper, we will use the shorthand 
[s'w\-.pt^—o\sw2-pt^ to indicate links, and assume that dup and 
modifications to the switch field occur only in links. 

Local programs. Since NetKAT can encode both the network 
topology and the behavior of switches, a NetKAT program describes 
the end-to-end behavior of a network. One simple way to write 
NetKAT programs is to define predicates that describe where pack¬ 
ets enter (in) and exit (out) the network, and interleave steps of 
processing on switches (p) and topology (t): 

in ■ {jp ■ t)* ■ p ■ out 

To execute the program, only p needs to be specified — the physical 
topology implements in, t, and out. Because no switch modifica¬ 
tions or dups occur in p, it can be directly compiled to a collection 
of forwarding tables, one for each switch. Provided the physical 
topology is faithful to the encoding specified by in, t, and out, a 
network of switches populated with these forwarding tables will 
behave like the above program. We call such a switch program p a 
local program because it describes the behavior of the network in 
terms of hop-by-hop forwarding steps on individual switches. 

Global programs. Because NetKAT is based on Kleene algebra, 
it includes regular expressions, which are a natural and expressive 
formalism for describing paths through a network. Ideally, pro¬ 
grammers would be able to use regular expressions to construct 
forwarding paths directly, without having to worry about how those 
paths were implemented. For example, a programmer might write 
the following to forward packets from port 1 on switch swi to port 
1 on switch SW2, and from port 2 on stni to port 2 on SW2, assum¬ 
ing a link connecting the two switches on port 3: 

pt=l ■ pt^3 • [swi-.pt^]->[sw2'.pt^] ■ pt-<—1 
+ pt=2 ■ ptt—3 • [swi-.pt^]->[sw2'.pt^] ■ ptt—2 

Note that this is not a local program, since is not written in the gen¬ 
eral form given above and instead combines switch processing and 
topology processing using a particular combination of union and 
sequential composition to describe a pair of overlapping forward¬ 
ing paths. To express the same behavior as a local NetKAT program 
or in a language such as Pyretic, we would have to somehow write 
a single program that specifies the processing that should be done 
at each intermediate step. The challenge is that when SW2 receives 
a packet from sn; i, it needs to determine if that packet originated at 
port 1 or 2 of stni, but this can’t be done without extra information. 
For example, the compiler could add a tag to packets at swi to track 
the original ingress and use this information to determine the pro¬ 
cessing at SW2. In general, the expressiveness of global programs 
creates challenges for the compiler, which must generate explicit 
code to create and manipulate tags. These challenges have not been 
met in previous work on NetKAT or other SDN languages. 

Virtual programs. Going a step further, NetKAT can also be used 
to specify behavior in terms of virtual topologies. To see why this 
is a useful abstraction, suppose that we wish to implement point- 
to-point connectivity between a given pair of hosts in a network 
with dozens of switches. One could write a global program that 
explicitly forwards along the path between these hosts. But this 
would be tedious for the programmer, since they would have to 
enumerate all of the intermediate switches along the path. A better 
approach is to express the program in terms of a virtual “big switch” 
topology whose ports are directly connected to the hosts, and where 
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Figure 3: NetKAT compiler pipeline. 


the relationship between ports in the virtual and physical networks 
is specified by an explicit mapping— e.g., the top of Figure 
depicts a big switch virtual topology. The desired functionality 
could then be specified using a simple local program that forwards 
in both directions between ports on the single virtual switch: 

p = (pt=l ■ ptt—2) -I- (pt=2 ■ ptt—1) 

This one-switch virtual program is evidently much easier to write 
than a program that has to reference dozens of switches. In addition, 
the program is robust to changes in the underlying network. If the 
operator adds new switches to the network or removes switches 
for maintenance, the program remains valid and does not need to 
be rewritten. In fact, this program could be ported to a completely 
different physical network too, provided it is able to implement the 
same virtual topology. 

Another feature of virtualization is that the physical-virtual 
mapping can limit access to certain switches, ports, and even 
packets that match certain predicates, providing a simple form of 
language-based isolation (H. In this example, suppose the physi¬ 
cal network has hundreds of connected hosts. Yet, since the virtual- 
physical mapping only exposes two ports, the abstraction guaran¬ 
tees that the virtual program is isolated from the hosts connected 
to the other ports. Moreover, we can run several isolated virtual 
networks on the same physical network, e.g., to provide different 
services to different customers in multi-tenant datacenters O. 

Of course, while virtual programs are a powerful abstraction, 
they create additional challenges for the compiler since it must 
generate physical paths that implement forwarding between virtual 
ports and also instrument programs with extra bookkeeping infor¬ 
mation to keep track of the locations of virtual packets traversing 
the physical network. Although virtualization has been extensively 
studied in the networking community (3l|7l[Il|25l, no previous 
work fully describes how to compile virtual programs. 
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Figure 2: Compiling using forwarding tables. 


Compilation pipeline. This paper presents new algorithms for 
compiling NetKAT that address the key challenges related to ex¬ 
pressiveness and performance just discussed. Figure depicts 
the overall architecture of our compiler, which is structured as a 
pipeline with several smaller stages: (i) a virtual compiler that 
takes as input a virtual program v, a virtual topology, and a map¬ 
ping that specifies the relationship between the virtual and physical 
topology, and emits a global program that uses a fabric to transit 
between virtual ports using physical paths; (ii) a global compiler 
that takes an arhitrary NetKAT program g as input and emits a local 
program that has been instrumented with extra state to keep track 
of the execution of the global program; and a (hi) local compiler 
that takes a local program p as input and generates OpenFlow for¬ 
warding tables, using a generalization of binary decision diagrams 
as an intermediate representation. Overall, our compiler automat¬ 
ically generates the extra state needed to implement virtual and 
global programs, with performance that is dramatically faster than 
current SDN compilers. 

These three stages are designed to work well together—e.g., the 
fabric constructed by the virtual compiler is expressed in terms of 
regular paths, which are translated to local programs by the global 
compiler, and the local and global compilers both use FDDs as 
an intermediate representation. However, the individual compiler 
stages can also be used independently. For example, the global 
compiler provides a general mechanism for compiling forwarding 
paths specified using regular expressions to SDN switches. We 
have also been working with the developers of Pyretic to improve 
performance by retargeting its backend to use our local compiler. 

The next few sections present these stages in detail, starting with 
local compilation and building up to global and virtual compilation. 

3. Local Compilation 

The foundation of our compiler pipeline is a translation that maps 
local NetKAT programs to OpenFlow forwarding tables. Recall that 
a local program describes the hop-by-hop behavior of individual 
switches—i.e. it does not contain dup or switch modifications. 

Compilation via forwarding tables. A simple approach to com¬ 
piling local programs is to define a translation that maps primitive 
constructs to forwarding tables and operators such as union and 
sequential composition to functions that implement the analogous 
operations on tables. For example, the current NetKAT compiler 
translates the modification pt-^—2 to a forwarding table with a sin¬ 
gle rule that sets the port of all packets to 2 (Figure|^(a)), while it 
translates the predicate dst=A to a flow table with two rules: the 
first matches packets where dst=A and leaves them unchanged and 
the second matches all other packets and drops them (Figure|^(b)). 

To compile the sequential composition of these programs, the 
compiler combines each row in the first table with the entire second 
table, retaining rules that could apply to packets produced by the 
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row (Figure (c)). In the example, the second table has a single 
rule that sends all packets to port 2. The first rule of the first 
table matches packets with destination A, thus the second table 
is transformed to only send packets with destination A to port 
2. However, the second rule of the first table drops all packets, 
therefore no packets ever reach the second table from this rule. 

To compile a union, the compiler computes the pairwise inter¬ 
section of all patterns to account for packets that may match both 
tables. For example, in Figure (d), the two sub-programs for¬ 
ward traffic to hosts A and B based on the dst header. These two 
sub-programs do not overlap with each other, which is why the 
table in the figure appears simple. However, in general, the two 
programs may overlap. Consider compiling the union of the for¬ 
warding program, in Figure |^(d) and the monitoring program in 
Figure[^(e). The monitoring program sends SSH packets and pack¬ 
ets with dst=A to port 3. The intersection will need to consider all 
interactions between pairs of rules—an 0{rP) operation. Since a 
NetKAT program may be built out of several nested programs and 
compilation is quadratic at each step, we can easily get a tower of 
squares or exponential behavior. 

Approaches based on flow tables are attractive for their sim¬ 
plicity, but they suffer several serious limitations. One issue is that 
tables are not an efficient way to represent packet-processing func¬ 
tions since each rule in a table can only encode positive tests on 
packet headers. In general, the compiler must emit sequences of 
prioritized rules to encode operators such as negation or union. 
Moreover, the algorithms that implement these operators are worst- 
case quadratic, which can cause the compiler to become a bottle¬ 
neck on large inputs. Another issue is that there are generally many 
equivalent ways to encode the same packet-processing function as 
a forwarding table. This means that a shaightforward computation 
of fixed-points, as is needed to implement Kleene star, is not guar¬ 
anteed to terminate. 
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Binary decision diagrams. To avoid these issues, our compiler 
is based on a novel representation of packet-forwarding functions 
using a generalization of binary decision diagrams (BDDs) mm- 
To briefly review, a BDD is a data structure that encodes a boolean 
function as a directed acyclic graph. The interior nodes encode 
boolean variables and have two outgoing edges: a true edge drawn 
as a solid line, and a false edge drawn as a dashed line. The leaf 
nodes encode constant values true or false. Given an assignment 
to the variables, we can evaluate the expression by following the 
appropriate edges in the graph. An ordered BDD imposes a total 
order in which the variables are visited. In general, the choice of 
variable-order can have a dramatic effect on the size of a BDD and 
hence on the run-time of BDD-manipulating operations. Picking 
an optimal variable-order is NP-hard, but efficient heuristics often 
work well in practice. A reduced BDD has no isomorphic subgraphs 
and every interior node has two distinct successors. A BDD can be 
reduced by repeatedly applying these two transformations: 

• If two subgraphs are isomorphic, delete one by connecting its 
incoming edges to the isomorphic nodes in the other, thereby 
sharing a single copy of the subgraph. 

• If both outgoing edges of an interior node lead to the same suc¬ 
cessor, eliminate the interior node by connecting its incoming 
edges directly to the common successor node. 

Logically, an interior node can be thought of as representing an 
IF-THEN-ELSE expressionj^For example, the expression: 

(a? (c? 1 : (d? 1: 0)) : (&? (c? 1: (d? 1 : 0)) : 0)) 

represents a BDD for the boolean expression (aV b) A (cV d). This 
notation makes the logical structure of the BDD clear while abstract¬ 
ing away from the sharing in the underlying graph representation 
and is convenient for defining BDD-manipulating algorithms. 

In principle, we could use BDDs to directly encode NetKAT 
programs as follows. We would treat packet headers as flat, n-bit 
vectors and encode NetKAT predicates as n-variable BDDs. Since 
NetKAT programs produce sets of packets, we could represent them 
in a relational style using BDDs with 2n variables. However, there 
are two issues with this representation: 

• Typical NetKAT programs modify only a few headers and leave 
the rest unchanged. The BDD that represents such a program 
would have to encode the identity relation between most of 
its input-output variables. Encoding the identity relation with 

* We write conditionals as (a ? b: c), in the style of the C ternary operator. 


BDDs requires a linear amount of space, so even trivial pro¬ 
grams, such as the identity program, would require large BDDs. 

• The final step of compilation needs to produce a prioritized 
flow table. It is not clear how to efficiently translate BDDs 
that represent NetKAT programs as relations into tables that 
represent packet-processing functions. For example, a table of 
length one is sufficient to represent the identity program, but to 
generate this table from the BDD sketched above, several paths 
would have to be compressed into a single rule. 

Forwarding Decision Diagrams. To encode NetKAT programs as 
decision diagrams, we introduce a modest generalization of BDDs 
ca\\e.d forwarding decision diagrams (FDDs). An FDD differs from 
BDDs in two ways. First, interior nodes match header fields instead 
of individual bits, which means we need far fewer variables com¬ 
pared to a BDD to represent the same program. Our FDD imple¬ 
mentation requires 12 variables (because OpenFlow supports 12 
headers), but these headers span over 200 bits. Second, leaf nodes 
in an FDD directly encode packet modifications instead of boolean 
values. Hence, FDDs do not encode programs in a relational style. 

Figures|^and|^show FDDs for a program that forwards HTTP 
packets to hosts 10.0.0.1 and 10.0.0.2 at ports 1 and 2 respectively. 
The diagrams have interior nodes that match on headers and leaf 
nodes corresponding to the actions used in the program. 

To generalize ordered BDDs to FDDs, we assume orderings 
on fields and values, both written C, and lift them to tests f=n 
lexicographically: 

/i=ni C /2=n2 = (/i C f2) V (/i = /2 A ni C n2) 

We require that tests be arranged in ascending order from the root. 
For reduced FDDs, we stipulate that they must have no isomor¬ 
phic subgraphs and that each interior node must have two unique 
successors, as with BDDs, and we also require that the FDD must 
not contain redundant tests and modifications. For example, if the 
test dst=10.0.0.1 is true, then dst=10.0.0.2 must be false. Ac¬ 
cordingly, an FDD should not perform the latter test if the for¬ 
mer succeeds. Similarly, because NetKAT’s union operator fp -\- q) 
is associative, commutative, and idempotent, to broadcast pack¬ 
ets to both ports 1 and 2 we could either write ptt—1 + pt<—2 
or ptt—2 -F pt-f-l. Likewise, repeated modifications to the same 
header are equivalent to just the final modification, and modifica¬ 
tions to ditferent headers commute. Hence, updating the dst header 
to 10.0.0.1 and then immediately re-updating it to 10.0.0.2 is the 
same as updating it to 10.0.0.2. In our implementation, we enforce 
the conditions for ordered, reduced FDDs by representing actions as 
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Figure 6: Auxiliary definitions for local compilation to FDDs. 
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Figure 7: Local compilation to FDDs. 


sets of sets of modifications, and by using smart constructors that 
eliminate isomorphic subgraphs and contradictory tests. 

Figurel^summarizes the syntax, semantics, and well-formedness 
conditionsfor FDDS formally. Syntactically, an FDD d is either a 
constant diagram specified by a set of actions {ai ,... ,ak}, where 
an action a is a finite map {/i-t—ni,..., n*,} from fields to 

values such that each field occurs at most once; or a conditional 
diagram (/=n? di :d 2 ) specified by a test f=n and two sub¬ 
diagrams. Semantically, an action a denotes a sequence of modifi¬ 
cations, a constant diagram {ai,..., a*,} denotes the union of the 
individual actions, and a conditional diagram (/=n ? di : d 2 ) tests 
if the packet satisfies the test and evaluates the true branch (di) 
or false branch (d 2 ) accordingly. The well-formedness judgments 
F C (/, n) and F F d ensure that tests appear in ascending order 
and do not contradict previous tests to the same field. The context 
F keeps track of previous tests and boolean outcomes. 

Local compiler. Now we are ready to present the local compiler 
itself, which goes in two stages. The first stage translates NetKAT 
source programs into FDDs, using the simple recursive translation 
given in Figures|^and|^ 

The NetKAT primitives true, false, and/-<—n all compile to simple 
constant FDDs. Note that the empty action set {} drops all packets 
while the singleton action set {{}} containing the identity action 
{} copies packets verbatim. NetKAT tests f=n compile to a condi¬ 


tional whose branches are the constant diagrams for true and false 
respectively. NetKAT union, sequence, negation, and star all recur¬ 
sively compile their sub-programs and combine the results using 
corresponding operations on FDDs, which are given in Figure]^ 
The FDD union operator (di -I- d 2 ) walks down the structure 
of di and d 2 and takes the union of the action sets at the leaves. 
However, the definition is a bit involved as some care is needed to 
preserve well-formedness. In particular, when combining multiple 
conditional diagrams into one, one must ensure that the ordering on 
tests is respected and that the final diagram does not contain contra¬ 
dictions. Readers familiar with BDDs may notice that this function 
is simply the standard “apply” operation (instantiated with union at 
the leaves). The sequential composition operator (di • d 2 ) merges 
two packet-processing functions into a single function. It uses aux¬ 
iliary operations d and d \fj^n to restrict a diagram d by a 
positive or negative test respectively. We elide the sequence opera¬ 
tor on atomic actions (which behaves like a right-biased merge of 
finite maps) and the negative restriction operator (which is similar 
to positive restriction, but not identical due to contradictory tests) 
to save space. The first few cases of the sequence operator han¬ 
dle situations where a single action on the left is composed with 
a diagram on the right. When the diagram on the right is a con¬ 
ditional, (/ = n?di : d 2 ), we partially evaluate the test using the 
modifications contained in the action on the left. For example, if 
the left-action contains the modification /•<—n, we know that the 
test will be true, whereas if the left-action modifies the field to an¬ 
other value, we know the test will be false. The case that handles 
sequential composition of a conditional diagram on the left is also 
interesting. It uses restriction and union to implement the composi¬ 
tion, reordering and removing contradictory tests as needed to en¬ 
sure well formedness. The negation ~>d operator is defined in the 
obvious way. Note that because negation can only be applied to 
predicates, the leaves of the diagram d are either {} or {{}}. Fi¬ 
nally, the FDD Kleene star operator d* is defined using a straight¬ 
forward fixed-point computation. The well-formedness conditions 
on FDDs ensures that a fixed point exists. 
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Figure 8: Forwarding table generation example. 


The soundness of local compilation from NetKAT programs to 
FDDs is captured by the following theorem: 

Theorem 1 (Local Soundness). /fLlIp] = d then |p| h = [d] h. 
Proof. Straightforward induction on p. □ 

The second stage of local compilation converts FDDs to for¬ 
warding tables. By design, this transformation is mostly straight¬ 
forward: we generate a forwarding rule for every path from the root 
to a leaf, using the conjunction of tests along the path as the pat¬ 
tern and the actions at the leaf. For example, the FDD in Figure]^ 
has four paths from the root to the leaves so the resulting forward¬ 
ing table has four rules. The left-most path is the highest-priority 
rule and the right-most path is the lowest-priority rule. Traversing 
paths from left to right has the effect of traversing true-branches 
before their associated false-branches. This makes sense, since the 
only way to encode a negative predicate is to partially shadow a 
negative-rule with a positive-rule. For example, the last rule in the 
figure cannot encode the test protoy^http. Flowever, since that rule 
is preceded by a pattern that tests proto=http, we can reason that 
the proto field is not HTTP in the last rule. If performed naively, 
this strategy could create a lot of extra forwarding rules—e.g., the 
table in Figurej^has two drop rules, even though one of them com¬ 
pletely shadows the other. In section]^ we discuss optimizations 
that eliminate redundant rules, exploiting the FDD representation. 

4. Global Compilation 

Thus far, we have seen how to compile local NetKAT programs into 
forwarding tables using FDDs. Now we turn to the global compiler, 
which translates global programs into equivalent local programs. 

In general, the translation from global to local programs re¬ 
quires introducing extra state, since global programs may use reg¬ 
ular expressions to describe end-to-end forwarding paths—e.g., re¬ 
call the example of a global program with two overlapping paths 
from Section Put another way, because a local program does 
not contain dup, the compiler can analyze the entire program and 
generate an equivalent forwarding table that executes on a single 
switch, whereas the control flow of a global program must be made 
explicit so execution can be distributed across multiple switches. 
More formally, a local program encodes a function from packets to 
sets of packets, whereas a global program encodes a function from 
packets to sets of packet-histories. 

To generate the extra state needed to encode the control flow 
of a global, distributed execution into a local program, the global 
compiler translates programs into finite state automata. To a first 
approximation, the automaton can be thought of as the one for 
the regular expression embedded in the global program, and the 
instrumented local program can be thought of as encoding the 
states and transitions of that automaton in a special header field. 
The actual construction is a bit more complex for several reasons. 
First, we cannot instrument the topology in the same way that we 
instrument switch terms. Second, we have to be careful not to 


introduce extra states that may lead to duplicate packet histories 
being generated. Third, NetKAT programs have more structure than 
ordinary regular expressions, since they denote functions on packet 
histories rather than sets of strings, so a more complicated notion 
of automaton—a symbolic NetKAT automaton—is needed. 

At a high-level, the global compiler proceeds in several steps: 

• It compiles the input program to an equivalent symbolic au¬ 
tomaton. All valid paths through the automaton alternate be¬ 
tween switch-processing states and topology-processing states, 
which enables executing them as local programs. 

• It introduces a program counter by instrumenting the automa¬ 
ton to keep track of the current automaton state in the pc field. 

• It determinizes the NetKAT automaton using an analogue of the 
subset construction for finite automata. 

• It uses heuristic optimizations to reduce the number of states. 

• It merges all switch-processing states into a single switch state 
and all topology-processing states into a single topology state. 

The final result is a single local program that can be compiled using 
the local compiler. This program is equivalent to the original global 
program, modulo the pc field, which records the automaton state. 

4.1 NetKAT Automata 

In prior work, some of the authors introduced NetKAT automata and 
proved the analogue of Kleene’s theorem: programs and automata 
have the same expressive power mi. This allows us to use au¬ 
tomata as an intermediate representation for arbitrary NetKAT pro¬ 
grams. This section reviews NetKAT automata, which are used in 
the global compiler, and then presents a function that constructs an 
automaton from an arbitrary NetKAT program. 

Definition 1 (NetKAT Automaton). A NetKAT automaton is a tuple 
{S, So, e, 5), where: 

• S is a finite set of states, 

• So € 5 is the start state, 

• e : S ^ Pk—>''P(Pk)w the observation function, and 

• 5 ■. S ^ Pk V{Pk X S) is the continuation function. 

A NetKAT automaton is said to be deterministic if <5 maps each 
packet to a unique next state at every state, or more formally if 

I {s' : S I (pk', s') G S spk'^ \ < 1 

for all states s and packets pk and pk'. 

The inputs to NetKAT automata are guarded strings drawn from 
the set Pk ■ (Pk ■ dup)* ■ Pk. That is, the inputs have the form 

pk^n ■ pki ■ dup • pk^ • dup ■ • • pk„ ■ dup • 

where n > 0. Intuitively, such strings represent packet-histories 
through a network: is the input state of a packet, is the 

output state, and the pk^ are the intermediate states of the packet 
that are recorded as it travels through the network. 

To process such a string, an automaton in state s can either 
accept the trace if n = 0 and pfe^ut £ e s pfcj„, or it can consume 
one packet and dup from the start of the string and transition to 
state s' if n > 0 and (pfci, s') £ 5 spk^.^. In the latter case, the 
automaton yields a residual trace: 

pk^ ■ pk^ • dup • • • pk^ ■ dup ■ pk^^^ 

Note that the “output” pk^ of state s becomes the “input” to the 
successor state s'. More formally, acceptance is defined as: 

accept s {pk.^ ■ pk^,^fj pA:„„i G e s pk^.^ 

accept s (pfci„ • pfc]^ • dup • m) \y accept s'(pAj^ • w) 

,s') E 5 s 
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Figure 9: Auxiliary definitions for NetKAT automata construction. 

Next, we define a function that builds an automaton A(p) from 
an arbitrary NetKAT program p such that 

{pk 

out ■.■pk„-.-....-.-.{pk^)) G lp\{pk^n) 

^ accept^(p) So {pk^^ • • dup ■ . .. • pk^^t) 

The construction is based on Antimirov partial derivatives for reg¬ 
ular expressions 0 . We fix a set of labels L, and annotate each oc¬ 
currence of dup in the source program p with a unique label I £ L. 
We then define a pair of functions: 

• £[-| : Pol -A Pol and 

• Vl-j : Pol ^ ^(Pol X Lx Pol) 

Intuitively, £\p\ can be thought of as extracting the local compo¬ 
nents fromp (and will be used to construct e), while 'D\p\ extracts 
the global components (and will be used to construct 5). A triple 
{d, I, k) G T’Ip] represents the derivative of p with respect to dup^. 
That is, d is the dup-free component of p up to dup^, and k is the 
residual program (or continuation) of p after dup^. 

We calculate £\p\ and 'D\p\ simultaneously using a simple 
recursive algorithm defined in Figure]^ The definition makes use 
of the following abbreviations, 

®Ip1 ■ <1 = q) I fc) G T>|p]} 

g-T>|p] = {{q-d,l,k) \ {d,£,k) G T>[p]} 

which lift sequencing to sets of triples in the obvious way. 

The next lemma characterizes £\p\ and Il[p], using the follow¬ 
ing notation to reconstruct programs from sets of triples: 

d-dup-k 

(d,e,k)eT>ipj 

Lemma 1 (Characterization of 5[-| and ©I']). For all programs 
p, we have the following: 

(a) p = Slpj + E®bl- 

(b) £IpI is a local program. 

(c) For all {d, i, k) G Il|pl, d is a local program. 

(d) For all labels I in p, there exist unique programs d and k such 
that {d,£, k) G 'F>\p\ 

Proof. By structural induction on p. Claims [h — d) are trivial. 
Claim (a) can be proved purely equationally using only the NetKAT 
axioms and the KAT-DENESTING rule from ID. □ 

Lemma[T](d) allows us to write ke to refer to the unique continua¬ 
tion of dup^. By convention, we let fco denote the “initial continua¬ 
tion,” namely p. 

Definition 2 (Program Automaton). The AefKAT automaton A(p) 
for a program p is defined as (S, Sq, e, S) where 

• S is the set of labels occurring in p, plus the initial label 0. 

• so = 0 

• etpk = {pk' I (pk') G [fprlKpA:)} 


• 5£pk = {{pk',£') I {d,£',k) G T’lfct] A {pk') G |d](pfc)} 

Theorem 2 (Program Automaton Soundness). For all programs p, 
packets pk and histories h, we have 

h G [p](pfc„) accept So (pfci„-pfci -dup. pk^-dup-pk^^t) 

where h = pkout::pk„:: ■ ■ ■ :'.{pkf). 

Proof. We first strengthen the claim, replacing {pkif) with an arbi¬ 
trary history pkinV.h', so with an arbitrary label £ £ S, and p with 
ki. We then proceed by induction on the length of the history, using 
Lemma[T]for the base case and induction step. □ 

4.2 Local Program Generation 

With a NetKAT automaton A(p) for the global program p in hand, 
we are now ready to construct a local program. The main idea is to 
make the state of the global automaton explicit in the local program 
by introducing a new header field pc (represented concretely using 
VLANs, MPLS tags, or any other unused header field) that keeps 
track of the state as the packet traverses the network. This encoding 
enables simulating the automaton for the global program using a 
single local program (along with the physical topology). We also 
discuss determinization and optimization, which are important for 
correctness and performance. 

Program counter. The first step in local program generation is to 
encode the state of the automaton into its observation and transition 
functions using the pc field. To do this, we use the same structures 
as are used by the local compiler, EDDs. Recall that the observa¬ 
tion function e maps input packets to output packets according to 
£■1^^], which is a dup-free NetKAT program. Hence, we can encode 
the observation function for a given state £ as a conditional FDD that 
tests whether pc is £ and either behaves like the FDD for £\k(\ or 
false. We can encode the continuation function S as an FDD in a 
similar fashion, although we also have to set the pc to each succes¬ 
sor state s'. This symbolic representation of automata using FDDs 
allows us to efficiently manipulate automata despite the large size 
of their “input alphabet”, namely |Pk x Pk|. In our implementa¬ 
tion we introduce the pc field and FDDs on the fly as automata are 
constructed, rather than adding them as a post-processing step, as 
is described here for ease of exposition. 

Determinization. The next step in local program generation is to 
determinize the NetKAT automaton. This step turns out to be critical 
for correctness—it eliminates extra outputs that would be produced 
if we attempted to directly implement a nondeterministic NetKAT 
automaton. To see why, consider a program of the form p -|- p. 
Intuitively, because union is an idempotent operation, we expect 
that this program will behave the same as just a single copy of p. 
However, this will not be the case when p contains a dup: each 
occurrence of dup will be annotated with a different label. There¬ 
fore, when we instrument the program to track automaton states, 
it will create two packets that are identical expect for the pc field, 
instead of one packet as required by the semantics. The solution to 
this problem is simply to determinize the automaton before convert¬ 
ing it to a local program. Determinization ensures that every packet 
trace induces a unique path through the automaton and prevents du¬ 
plicate packets from being produced. Using FDDs to represent the 
automaton symbolically is crucial for this step: it allows us to im¬ 
plement a NetKAT analogue of the subset construction efficiently. 

Optimization. One practical issue with building automata using 
the algorithms described so far is that they can use a large num¬ 
ber of states—one for each occurrence of du p in the program—and 
determinization can increase the number of states by an exponen¬ 
tial factor. Although these automata are not wrong, attempting to 
compile them can lead to practical problems since extra states will 








trigger a proliferation of forwarding rules that must be installed on 
switches. Because switches today often have limited amounts of 
memory—often only a few thousand forwarding rules—reducing 
the number of states is an important optimization. An obvious idea 
is to optimize the automaton using (generalizations of) textbook 
minimization algorithms. Unfortunately this would be prohibitively 
expensive since deciding whether two states are equal is a costly 
operation in the case of NetKAT automata. Instead, we adopt a sim¬ 
ple heuristic that works well in practice and simply merge states 
that are identical. In particular, by representing the observation and 
transition functions as FDDs, which are hash consed, testing equal¬ 
ity is cheap—simple pointer comparisons. 

Local Program Extraction. The final step is to extract a local 
program from the automaton. Recall from Section]^ that, by defi¬ 
nition, links are enclosed by dups on either side, and links are the 
only NetKAT terms that contain dups or modify the switch field. It 
follows that every global program gives rise to a bipartite NetKAT 
automaton in which all accepting paths alternate between “switch 
states” (which do not modify the switch field) and “link states” 
(which forward across links and do modify the switch field), be¬ 
ginning with a switch state. Intuitively, the local program we want 
to extract is simply the union of of the e and S FDDs of all switch 
states (recall Lemma [T] (a)), with the link states implemented by 
the physical network. Note however, that the physical network will 
neither match on the pc nor advance the pc to the next state (while 
the link states in our automaton do). To fix the latter, we observe 
that any link state has a unique successor state. We can thus simply 
advance the pc by two states instead of one at every switch state, 
anticipating the missing pc modification in link states. To address 
the former, we employ the equivalence 

[sWl-.ptj]->[sW 2 '.pt 2 ] = sw = l ■ pt = l ■ t ■ sw = 2 ■ pt = 2 

It allows us to replace links with the entire topology if we modify 
switch states to match on the appropriate source and destination 
locations immediately before and after transitioning across a link. 
After modifying the e and 5 FDDs accordingly and taking the union 
of all switch states as described above, the resulting FDD can be 
passed to the local compiler to generate forwarding tables. 

The tables will correctly implement the global program pro¬ 
vided the physical topology (in, t, out) satisfies the following: 

• p = in ■ p ■ out, i.e. the global program specifies end-to-end 

forwarding paths 

• t implements at least the links used in p. 

• t ■ in = false = out ■ t, i.e. the in and out predicates should 

not include locations that are internal to the network. 

5. Virtual Compilation 

The third and final stage of our compiler pipeline translates vir¬ 
tual programs to physical programs. Recall that a virtual program 
is one that is defined over a virtual topology. Network virtualization 
can make programs easier to write by abstracting complex physical 
topologies to simpler topologies and also makes programs portable 
across different physical topologies. It can even be used to multi¬ 
plex several virtual networks onto a single physical network— e.g., 
in multi-tenant datacenters CD. 

To compile a virtual program, the compiler needs to know the 
mapping between virtual switches, ports, and links and their coun¬ 
terparts at the physical level. The programmer supplies a virtual 
program v, a virtual topology t, sets of ingress and egress loca¬ 
tions for t, and a relation TZ between virtual and physical ports. 
The relation TZ must map each physical ingress to a virtual ingress, 
and conversely for egresses, but is otherwise unconstrained—e.g.. 


it need not be injective or even a function ^The constraints on in- 
gresses and egresses ensures that each packet entering the physical 
network lifts uniquely to a packet in the virtual network, and sim¬ 
ilarly for packets editing the virtual network. During execution of 
the virtual program, each packet can be thought of as having two 
locations, one in the virtual network and one in the physical net¬ 
work; TZ defines which pairs of locations are consistent with each 
other. For simplicity, we assume the virtual program is a local pro¬ 
gram. If it is not, the programmer can use the global compiler to 
put it into local form. 

Overview. To execute a virtual program on a physical network, 
possibly with a different underlying topology, the compiler must 
(i) instrument the program to keep track of packet locations in the 
virtual topology and (ii) implement forwarding between locations 
that are adjacent in the virtual topology using physical paths. To 
achieve this, the virtual compiler proceeds as follows: 

1. It instruments the program to use the virtual switch (vsw/) and 
virtual port (vpt) fields that track of the location of the packet 
in the virtual topology. 

2. It constructs r fabric, a NetKAT program that updates the phys¬ 
ical location of a packet when its virtual location changes and 
vice versa, after each step of processing to restore consistency 
with respect to the virtual-physical relation, TZ. 

3. It assembles the final program by combining v with the fabric, 
eliminating the vsw and vpt fields, and compiling the result 
using the global compiler. 

Most of the complexity arises in the second step because there may 
be many valid fabrics (or there may be none). However, this step 
is independent of the virtual program. The fabric can be computed 
once and for all and then be reused as the program changes. Fabrics 
can be generated in several ways— e.g., to minimize a costs such as 
path length or latency, maximize disjointness, etc. 

Instrumentation. To keep track of a packet’s location in the vir¬ 
tual network, we introduce new packet fields vsw and vpt for the 
virtual switch and the virtual port, respectively. We replace all oc¬ 
currences of the sw or pt field in the program v and the virtual 
topology t with vsw and vpt respectively using a simple textual 
substitution. Packets entering the physical network must be lifted 
to the virtual network. Hence, we replace in with a program that 
matches on all physical ingress locations I and initializes vsw and 
vpt in accordance with TZ: 

in' = sw=sw ■ pt=pt ■ vsw<—tiSM) • vpt<—fpt 

{sw,pt)^I 

(vsw,vpt) TZ {sw,pt) 

Recall that we require TZ to relate each location in I to at most 
one virtual ingress, so the program lifts each packet to at most 
one ingress location in the virtual network. The vsw and vpt fields 
are only used to track locations during the early stages of virtual 
compilation. They are completely eliminated in the final assembly. 
Hence, we will not need to introduce additional tags to implement 
the resulting physical program. 

Fabric construction. Each packet can be thought of as having two 
locations: one in the virtual topology and one in the underlying 
physical topology. After executing in', the locations are consistent 
according to the virtual-physical relation TZ. However, consistency 
can be broken after each step of processing using the virtual pro¬ 
gram V or virtual topology t. To restore consistency, we construct 

^ Actually, we can relax this condition slightly and allow physical ingresses 
to map to zero or one virtual ingresses—if a physical ingress has no corre¬ 
sponding representative in the virtual network, then packets arriving at that 
ingress will not be admitted to the virtual network. 
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Figure 10: Fabric game graph edges. 


a fabric comprising programs fin and font from the virtual and 
physical topologies and TZ, and insert it into the program: 

q= in' ■ {v ■ font) ■ {t- fzn ■ V ■ font)* ' OUt 

In this program, v and t alternate with font and fin in processing 
packets, thereby breaking and restoring consistency repeatedly. 
Intuitively, it is the job of the fabric to keep the virtual and physical 
locations in sync. 

This process can be viewed as a two-player game between a 
virtual player V (embodied by v and t) and a fabric player TF 
(embodied by font and fin). The players take turns moving a packet 
across the virtual and the physical topology, respectively. Player V 
wins if the fabric player T fails to restore consistency after a finite 
number of steps; player F wins otherwise. Constructing a fabric 
now amounts to finding a winning strategy for F. 

We start by building the game graph G = {V,E) modeling all 
possible ways that consistency can be broken by V or restored by 
F. Nodes are pairs of virtual and physical locations, [(„, Ip], where 
a location is a 3-tuple comprising a switch, a port, and a direction 
that indicates if the packet entering the port (I) leaving the port (0). 
The rules in Figure [^determine the edges of the game graph: 

• The edge [(„, Ip] —>■ ]l'n, Ip] exists if V can move packets from 
In to I'n- There are two ways to do so: either V moves packets 
across a virtual switch (V-POL) or across a virtual link (V- 
TOPO). In the inference rules, we write —>■„ to denote a single 
hop in the virtual topology: 

{vsw, vpt, d) —>■„ {vsw', vpt', d') 
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Figure 11: Reachable and fatal nodes. 


if d = I and d' = 0 then the hop is across one switch, but if 
d = 0 and d' = I then the hop is across a link. 

• The edge [In, Ip] —> I'p] exists if F can move packets from 

Ip to I'p. When F makes a move, it must restore physical-virtual 
consistency (the TZ relation in the premise of J^-POL and F- 
TOPO). To do so, it may need to take several hops through the 
physical network (written as —)• 

• In addition, F may leave a packet at their current location, if 
the location is already consistent (J^-LOOP-IN and J^-LOOP- 
OUT). Note that these force a packet located at physical location 
{sw,pt, 0) to leave through port pi eventually. Intuitively, once 
the fabric has committed to emitting the packet through a given 
port, it can only delay but not withdraw that commitment. 

Although these rules determine the complete game graph, all 
packets enter the network at an ingress location (determined by 
the in' predicate). Therefore, we can restrict our attention to only 
those nodes that are reachable from the ingress (reachable nodes 
in Figure In the resulting graph G = {V,E), every path 
represents a possible trajectory that a packet processed by q may 
take through the virtual and physical topology. 

In addition to removing unreachable nodes, we must remove 
fatal nodes, which are the nodes where F is unable to restore 
consistency and thus loses the game. J'-fatal says that any state 
from which F is unable to move to a non-fatal state is fatal. In 
particular, this includes states in which F cannot move to any other 
state at all. V-fatal says that any state in which V can move to a 
fatal state is fatal. Intuitively, we define such states to be fatal since 
we want the fabric to work for any virtual program the programmer 
may write. Fatal states can be removed using a simple backwards 
traversal of the graph starting from nodes without outgoing edges. 
This process may remove ingress nodes if they turn out to be fatal. 
This happens if and only if there exists no fabric that can always 
restore consistency for arbitrary virtual programs. Of course, this 
case can only arise if the physical topology is not bidirectional. 

Fabric selection. If all ingress nodes withstand pruning, the re¬ 
sulting graph encodes exactly the set of all winning strategies for 
F, i.e. the set of all possible fabrics. A fabric is a subgraph of G 
that contains the ingress, is closed under all possible moves by the 
virtual program, and contains exactly one edge out of every state in 
















































(a) Routing on fc-pod fat-trees. 



(b) Destination-based routing on topology zoo. 



(c) Time needed to compile SDX benchmarks. 


Figure 12: Experimental results: compilation time. 


which T has to restore consistency. The J^-edges must be labeled 
with concrete paths through the physical topology, as there may ex¬ 
ist several paths implementing the necessary multi-step transporta¬ 
tion from the source node to the target node. 

In general, there may be many fabrics possible and the choice 
of different J^-edges correspond to fabrics with different character¬ 
istics, such as minimizing hop counts, maximizing disjoint paths, 
and so on. Our compiler implements several simple strategies. For 
example, given a metric on paths (such as hop count), our greedy 
strategy starts at the ingresses and adds a node whenever it is reach¬ 
able through an edge e rooted at a node u already selected, and e 
is (i) any V-player edge or (ii) the J^-player edge with path tt min¬ 
imizing among all edges and their paths rooted at u. 

After a fabric is selected, it is straightforward to encode it as 
a NetKAT term. Every J^-edge [lv,lp] — >■ [lv,lp] in the graph is 
encoded as a NetKAT term that matches on the locations and Ip, 
forwards along the corresponding physical path from Ip to Ip, and 
then resets the virtual location to . Resetting the virtual location 
is semantically redundant but will make it easy to eliminating the 
vsw and vpt fields. We then take /;„ to be the union of all J'-in- 
edges, and font to be the union of all J'-ouT-edges. NetKAT’s 
global abstractions play a key role, providing the building blocks 
for composing multiple overlapping paths into a unified fabric. 

End-to-end Compilation. After the programs in', fin, and font, 
are calculated from TZ, we assemble the physical program q, de¬ 
fined above. Flowever, one last potential problem remains: although 
the virtual compiler adds instrumentation to update the physical 
switch and port fields, the program still matches and updates the 
virtual switch (vsw) and virtual port (vpt). Flowever, note that by 
construction of q, any match on the vsw or vpt field is preceded by 
a modification of those fields on the same physical switch. There¬ 
fore, all matches are automatically eliminated during FDD genera¬ 
tion, and only modifications of the vsw and vpt fields remain. These 
can be safely erased before generating flow tables as the global 
compiler inserts a program counter into q that plays double-duty to 
track both the physical location and the virtual location of a packet. 
Flence, we only need a single tag to compile virtual programs! 

6. Evaluation 

To evaluate our compiler, we conducted experiments on a diverse 
set of real-world topologies and benchmarks. In practice, our com¬ 
piler is a module that is used by the Frenetic SDN controller to map 
NetKAT programs to flow tables. Whenever network events occur, 
e.g., a host connects, a link fails, traffic patterns change, and so 
on, the controller may react by generating a new NetKAT program. 
Since network events may occur rapidly, a slow compiler can easily 


be a bottleneck that prevents the controller from reacting quickly to 
network events. In addition, the flow tables that the compiler gen¬ 
erates must be small enough to fit on the available switches. More¬ 
over, as small tables can be updated faster than large tables, table 
size affects the controller’s reaction time too. 

Therefore, in all the following experiments we measure flow- 
table compilation time and flow-table size. We apply the compiler 
to programs for a variety of topologies, from topology designs for 
very large datacenters to a dataset of real-world topologies. We 
highlight the effect of important optimizations to the fundamental 
FDD-based algorithms. We perform all experiments on 32-core, 2.6 
GHz Intel Xeon E5-2650 machines with 64GB RAM|^We repeat 
all timing experiments ten times and plot their average. 

Fat trees. A fat-tree O is a modem datacenter network design 
that uses commodity switches to minimize cost. It provides sev¬ 
eral redundant paths between hosts that can be used to maximize 
available bandwidth, provide backup paths, and so on. A fat-tree 
is organized into pods, where a fe-pod fat-tree topology can sup¬ 
port up to ^ hosts. A real-world datacenter might have up to 48 
pods (2l- Therefore, our compiler should be able to generate for¬ 
warding programs for a 48-pod fat tree relatively quickly. 

Figure |12a| shows how the time needed to generate all flow 
tables varies with the number of pods in a fat-tree|^ The graph 
shows that we take approximately 30 seconds to produce tables for 
48-pod fat trees (i.e., 27,000 hosts) and less than 120 seconds to 
generate programs for 60-pod fat trees (i.e., 54,000 hosts). 

This experiment shows that the compiler can generate tables for 
large datacenters. But, this is partly because the fat-tree forward¬ 
ing algorithm is topology-dependent and leverages symmetries to 
minimize the amount of forwarding rules needed. Many real-world 
topologies are not regular and require topology-independent for¬ 
warding programs. In the next section, we demonstrate that our 
compiler scales well with these topologies too. 

Topology Zoo. The Topology Zoo 1181 is a dataset of a few hun¬ 
dred real-world network topologies of varying size and stmcture. 
For every topology in this dataset, we use destination-based rout¬ 
ing to connect all nodes to each other. In destination-based routing, 
each switch filters packets by their destination address and forwards 
them along a spanning-tree rooted at the destination. Since each 
switch must be able to forward to any destination, the total number 
of rules must be 0{n^) for an n-node network. 

^ Our compiler is single-threaded and doesn’t leverage multicore. 

“^This benchmark uses the switch-specialization optimization, which we 
describe in the next section. 




















0.5 0.6 0.7 0.8 0.9 1.0 

Compression Ratio 


1.2 1.5 l.t 

Size Overhead 



(a) Compressing Classbench ACLs. 
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Figure 13: Experimental results: forwarding table compression and global compilation. 


Figure [T^ shows how the running time of the compiler varies 
across the topology zoo benchmarks. The curves are not as smooth 
as the curve for fat-trees, since the complexity of forwarding de¬ 
pends on features of network topology. Since the topology zoo is 
so diverse, this is a good suite to exercise the switch specialization 
optimization that dramatically reduces compile time. 

A direct implementation builds of the local compiler builds one 
FDD for the entire network and uses it to generate flow tables for 
each switch. However, since several FDD (and BDD) algorithms are 
fundamentally quadratic, it helps to first specialize the program for 
each switch and then generate a small FDD for each switch in the 
network {switch specialization). Building FDDs for several smaller 
programs is typically much faster than building a single FDD for 
the entire network. As the graph shows, this optimization has a 
dramatic effect on all but the smallest topologies. 

SDX. Our experiments thus far have considered some quite large 
forwarding programs, but none of them leverage software-defined 
networking in any interesting way. In this section, we report on our 
performance on benchmarks from a recent SIGCOMM paper [SI 
that proposes a new application of SDN. 

An Internet exchange point (IXP) is a physical location where 
networks from several ISPs connect to each other to exchange traf¬ 
fic. Legal contracts between networks are often implemented by 
routing programs at IXPs. However, today’s IXPs use baroque pro¬ 
tocols the needlessly limit the kinds of programs that can be im¬ 
plemented. A Software-defined IXP (an “SDX’ ’HD) gives partici¬ 
pants fine-grained control over packet-processing and peering us¬ 
ing a high-level network programming language. The SDX proto¬ 
type uses Pyretic [251 to encode policies and presents several ex¬ 
amples that demonstrate the power of an expressive network pro¬ 
gramming language. 

We build a translator from Pyretic to NetKAT and use it to eval¬ 
uate our compiler on SDXs own benchmarks. These benchmarks 
simulate a large IXP where a few hundred peers apply programs 
to several hundred prefix groups. The dashed lines in Figure [T2^ 
reproduce a graph from the SDX paper, which shows how compila¬ 
tion time varies with the number of prefix groups and the number of 
participants in the SDx|^The solid lines show that our compiler is 
orders of magnitude faster. Pyretic takes over 10 minutes to compile 
the largest benchmark, but our compiler only takes two seconds. 

Although Pyretic is written in Python, which is a lot slower than 
OCaml, the main problem is that Pyretic has a simple table-based 
compiler that does not scale (Section|^. In fact, the authors of SDX 


^ We get nearly the same numbers as the SDX paper on our hardware. 


had to add several optimizations to get the graph depicted. Despite 
these optimizations, our FDD-based approach is substantially faster. 

The SDX paper also reports flow-table sizes for the same bench¬ 
mark. At first, our compiler appeared to produce tables that were 
twice as large as Pyretic. Naturally, we were unhappy with this re¬ 
sult and investigated. Our investigation revealed a bug in the Pyretic 
compiler, which would produce incorrect tables that were artifi¬ 
cially small. The authors of SDX have confirmed this bug and it has 
been fixed in later versions of Pyretic. We are actively working with 
them to port SDX to NetKAT to help SDX scale further. 

Classbench. Lastly, we compile ACLs generated using Class- 
bench (32). These are realistic firewall rules that showcase another 
optimization: it is often possible to significantly compress tables by 
combining and eliminating redundant rules. 

We build an optimizer for the flow-table generation algorithm 
in Figure]^ Recall that that we generate flow-tables by converting 
every complete path in the FDD into a rule. Once a path has been 
traversed, we can remove it from the FDD without harm. However, 
naively removing a path may produce an FDD that is not reduced. 
Our optimization is simple: we remove paths from the FDD as they 
are turned into rules and ensure that the FDD is reduced at each 
step. When the last path is turned into a rule, we are left with a triv¬ 
ial FDD. This iterative procedure prevents several unnecessary rales 
from being generated. It is possible to implement other canonical 
optimizations. But, this optimization is unique because it leverages 
properties of reduced FDDs. Figure [T^ shows that this approach 
can produce 30% fewer rules on average than a direct implemen¬ 
tation of flow-table generation. We do not report running times for 
the optimizer, but it is negligible in all our experiments. 

Global compiler. The benchmarks discussed so far only use the 
local compiler. In this section, we focus on the global compiler. 
Since the global compiler introduces new abstractions, we can’t 
apply it to existing benchmarks, such as SDX, which use local 
programs. Instead, we need to build our own benchmark suite of 
global programs. To do so, we build a generator that produces 
global programs that describe paths between hosts. Again, an n- 
node topology has 0{n^) paths. We apply this generator to the 
Topology Zoo, measuring compilation time and table size: 

• Compilation time: since the global compiler leverages FDDs, we 
can expect automaton generation to be fast. However, global 
compilation involves other steps such as determinization and 
localization and their effects on compilation time may matter. 
Figure [T3c| shows how compilation time varies with the total 
number of rales generated. This graph does grow faster than 
local compilation time on the same benchmark (the red, dashed 









Figure 14: Three fabrics optimizing different metrics 


line in Figure [12^. Switch-specialization, which dramatically 
reduces the size of FDDs and hence compilation time, does not 
work on global programs. Therefore, it makes most sense to 
compare this graph to local compilation with a single FDD. 

• Table size: The global compiler has some optimizations to elim¬ 
inate unnecessary states, which produces fewer rules. However, 
it it does not fully minimize NetKAT automata thus it may pro¬ 
duce more rules than equivalent local programs. Figure |13b| 
shows that on the topology zoo, global routing produces tables 
that are no more than twice as large as local routing. 

We belive these results are promising; we spent a lot of time tuning 
the local compiler, but the global compiler is an early prototype 
with much room for improvement. 

Virtualization case study. Finally, we present a small case study 
that showcases the virtual compiler on a snapshot of the AT&T 
backbone network circa 2007-2008. This network is part of the 
Topology Zoo and shown in Figure [T4] We construct a “one big 
switch” virtual network and use it to connect five nodes (high¬ 
lighted in green) to each other: 

5 

dst=10.0.0.n • pt<—n 

n = l 

To map the virtual network to the physical network, we generate 
three different fabrics: (a) a fabric that minimizes the total number 
of links used across the network, (b) a fabric that minimizes the 
number of hops between hosts, and (c) a fabric that minimizes the 
physical length of the path between hosts. In the figure, the links 
utilized by each of these fabrics is highlighted in red. 

The three fabrics give rise to three very different implementa¬ 
tions of the same virtual program. Note that the program and the 
fabric are completely independent of each other and can be up¬ 
dated independently. For example, the operator managing the phys¬ 
ical network could change the fabric to implement a new SLA, e.g. 
move from minimum-utilization to shortest-paths. This change re¬ 
quires no update to the virtual program; the network would witness 
performance improvement for free. Similarly, the virtual network 
operator could decide to implement a new firewall policy in the 
virtual network or change the forwarding behavior. The old fabric 
would work seamlessly with this new virtual program without inter¬ 
vention by the physical network operator. In principle, our compiler 
could even be used repeatedly to virtualize virtual networks. 

7. Related Work 

A large body of work has explored the design of high-level lan¬ 
guages for SDN programming (8][T9l|24l|^|^|^|^. Our work 
is unique in its focus on the task of engineering efficient compilers 


that scale up to large topologies as well as expressive global and 
virtual programs. 

An early paper by Monsanto et al. proposed the NetCore lan¬ 
guage and presented an algorithm for compiling programs based 
on forwarding tables HD. Subsequent work by Guha et al. devel¬ 
oped a verified implementation of NetCore in the Coq proof as¬ 
sistant m. Anderson et al. developed NetKAT as an extension to 
NetCore and proposed a compilation algorithm based on manipu¬ 
lating nested conditionals, which are essentially equivalent to for¬ 
warding tables. The correctness of the algorithm was justified us¬ 
ing NetKAT’s equational axioms, but didn’t handle global programs 
or Kleene star. Concurrent NetCore l30l grows NetCore with fea¬ 
tures that target next-generation SDN-switches. The original Pyretic 
paper implemented an “reactive microfiow interpreter” and not a 
compiler da However later work developed a compiler in the 
style of NetCore. SDX uses Pyretic to program Internet exchange 
points (m . Co Visor develops incremental algorithms for maintain¬ 
ing forwarding table in the presence of changes to programs com¬ 
posed using NetCore-like operators ini- Recent work by Jose et 
al. developed a compiler based on integer linear programming for 
next-generation switches, each with multiple, programmable for¬ 
warding tables 06]. 

A number of papers in the systems community have proposed 
mechanisms for implementing virtual network programs. An early 
workshop paper by Casado proposed the idea of network virtual¬ 
ization and sketched an implementation strategy based on a hyper¬ 
visor I2I. Our virtual compiler extends this basic strategy by in¬ 
troducing a generalized notion of a fabric, developing concrete al¬ 
gorithms for computing and selecting fabrics, and showing how to 
compose fabrics with virtual programs in the context of a high-level 
language. Subsequent work by Koponen et al. described VMware’s 
NVP platform, which implements hypervisor-based virtualization 
in multi-tenant datacenters (m. Pyretic (25), CoVisor (B), and 
OpenVirteX |3( all support virtualization—the latter at three differ¬ 
ent levels of abstraction: topology, address, and control application. 
However, none of these papers present a complete description of al¬ 
gorithms for computing the forwarding state needed to implement 
virtual networks. 

The FDDs used in our local compiler as well as our algorithms 
for constructing NetKAT automata are inspired by Pous’s work on 
symbolic KAT automata (27) and work by some of the authors on 
a verification tool for NetKAT dD. The key differences between 
this work and ours is that they focus on verification of programs 
whereas we develop compilation algorithms. HDDs have been used 
for verification for several decades dllD. In the context of net¬ 
works, BDDs and BDD-like structures have been used to optimize 
access control policies 1211 . TCAMs 1221 . and to verify (171 data 
plane configurations, but our work is the first to use BDDs to com¬ 
pile network programs. 











8. Conclusion 

This paper describes the first complete compiler for the NetKAT 
language. It presents a suite of tools that leverage HDDs, graph al¬ 
gorithms, and symbolic automata to efficiently compile programs 
in the NetKAT language down to compact forwarding tables for 
SDN switches. In the future, we plan to investigate whether richer 
constructs such as stateful and probabilistic programs can be im¬ 
plemented using our techniques, how classic algorithms from the 
automata theory literature can be adapted to optimize global pro¬ 
grams, how incremental algorithms can be incorporated into our 
compiler, and how the compiler can assist in performing graceful 
dynamic updates to network state. 
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